-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(clp-package): Unify the metadata schema for JSON and IR streams. #620
Conversation
WalkthroughThe changes in this pull request involve modifications primarily to the output handling and result processing components across several files. Key updates include renaming constants in BSON document constructions, altering method signatures to accommodate unused parameters, and refining error handling in various methods. The changes reflect a restructuring of how results are stored and retrieved, particularly focusing on the identifiers used in the database schema. Overall, the functionality remains consistent while aligning the code with a new organizational structure. Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
components/log-viewer-webui/server/src/DbManager.js (1)
174-176
: LGTM: Query field aligned with stream_id conventionThe update to use
stream_id
in the MongoDB query is consistent with the standardization effort.Consider adding an index on
stream_id
,begin_msg_ix
, andend_msg_ix
fields if not already present, to optimize this query:await this.#streamFilesCollection.createIndex({ stream_id: 1, begin_msg_ix: 1, end_msg_ix: 1 });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
components/core/src/clp/clo/OutputHandler.cpp
(1 hunks)components/core/src/clp/clo/clo.cpp
(1 hunks)components/core/src/clp/clo/constants.hpp
(1 hunks)components/core/src/clp_s/JsonConstructor.cpp
(1 hunks)components/core/src/clp_s/archive_constants.hpp
(1 hunks)components/log-viewer-webui/client/src/api/query.js
(2 hunks)components/log-viewer-webui/server/src/DbManager.js
(1 hunks)
🧰 Additional context used
📓 Learnings (1)
components/log-viewer-webui/client/src/api/query.js (1)
Learnt from: junhaoliao
PR: y-scope/clp#596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
🔇 Additional comments (6)
components/core/src/clp/clo/constants.hpp (2)
15-15
: Verify SearchOutput namespace organization
The addition of OrigFileId
to SearchOutput namespace is logical, maintaining clear separation between IR and search output constants.
8-8
: Verify usage of new StreamId constant
The addition of StreamId
aligns with the PR objective of standardizing metadata. However, we should verify all usages of this new constant.
✅ Verification successful
No issues found with StreamId constant usage
The new StreamId
constant is properly used in the codebase, and there are no remaining instances of hardcoded "stream_id" strings that need to be updated. The constant is referenced in:
components/core/src/clp/clo/clo.cpp
for results cache keyscomponents/core/src/clp_s/archive_constants.hpp
which defines a similar constantcomponents/webui/imports/ui/SearchView/SearchResults/SearchResultsTable/index.jsx
which uses the stream ID in a different context
The old file_split_id
references are in different components and contexts, not related to this standardization change.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to file_split_id to ensure complete migration
rg -i "file_split_id" --type cpp --type hpp
# Search for usage of the new StreamId constant
rg "StreamId\[\]" --type cpp --type hpp
Length of output: 141
Script:
#!/bin/bash
# Search for any remaining references to file_split_id
rg -i "file_split_id"
# Search for usage of the new StreamId constant
rg "StreamId"
# List all hpp and cpp files to understand the codebase structure
fd "\.(hpp|cpp)$"
Length of output: 37011
components/log-viewer-webui/client/src/api/query.js (2)
22-22
: Confirm API response type alignment
The return type change to ExtractStreamResp
appears correct, but we should verify the server-side response structure matches.
✅ Verification successful
API response type is properly defined and aligned
The ExtractStreamResp
type is well-defined in the same file with all required properties (stream_id
, begin_msg_ix
, end_msg_ix
, is_last_ir_chunk
, path
), and the server-side endpoint at /query/extract-stream
handles the corresponding request parameters (extractJobType
, streamId
, logEventIdx
). The type change is safe and correctly implemented.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for extract-stream endpoint implementation
rg -A 10 "extract-stream" --type cpp --type hpp
Length of output: 75
Script:
#!/bin/bash
# Search for extract-stream endpoint implementation in all files
rg -A 10 "extract-stream" -i
# Also search for ExtractStreamResp type definition
rg "ExtractStreamResp" -A 5
Length of output: 2831
5-12
: Verify type definition completeness
The new ExtractStreamResp
type includes is_last_ir_chunk
which seems IR-specific. Consider if this should be in a separate type for IR responses.
✅ Verification successful
Let me gather more context about the usage of this type and related IR functionality.
Based on the gathered context, I can now provide a final response.
The is_last_ir_chunk
property is correctly placed in ExtractStreamResp
The codebase shows that IR (Intermediate Representation) is a specific stream type in the system, with dedicated handling:
- The server has an
EXTRACT_IR
job type distinct fromEXTRACT_JSON
- The client UI checks for IR streams with
IS_IR_STREAM
condition - The
is_last_ir_chunk
property is specifically used when handling IR stream extractions
The property belongs in the ExtractStreamResp
type as it's a necessary part of the IR extraction response flow, not warranting a separate type definition.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of ExtractStreamResp type
rg "ExtractStreamResp" --type js --type ts
# Search for references to is_last_ir_chunk
rg "is_last_ir_chunk" --type js --type ts
Length of output: 380
Script:
#!/bin/bash
# Search for the implementation of extract stream functionality
ast-grep --pattern 'function extractStream($$$) { $$$ }'
# Search for IR related code context
rg -B 2 -A 2 "ir" --type js --type ts -i
# Look for the query.js file content
cat components/log-viewer-webui/client/src/api/query.js
Length of output: 31926
components/core/src/clp_s/archive_constants.hpp (1)
32-32
: Verify CLP-S decompression functionality
The replacement of cOrigFileId
with cStreamId
in the decompression namespace aligns with the standardization objective. However, we should verify this change doesn't affect CLP-S decompression functionality.
✅ Verification successful
No impact on CLP-S decompression functionality
The change from cOrigFileId
to cStreamId
is safe as:
- The old constant
cOrigFileId
has no references in the codebase - The new constant
cStreamId
is properly used inJsonConstructor.cpp
for decompression results - The decompression functionality is well-tested through
test-Segment.cpp
which verifies data integrity after compression/decompression cycles - The change aligns with the standardization without affecting the core decompression logic
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of the old and new constants
rg "cOrigFileId" --type cpp --type hpp
rg "cStreamId" --type cpp --type hpp
# Look for any decompression-related tests
fd -e cpp -e hpp | rg -i "test.*decompress"
Length of output: 177
Script:
#!/bin/bash
# Search for usage of the old and new constants with correct file extensions
rg "cOrigFileId" -t cc -t h -t cpp -t hpp
rg "cStreamId" -t cc -t h -t cpp -t hpp
# Look for decompression-related tests with proper file extensions
find . -type f \( -name "*.cpp" -o -name "*.hpp" -o -name "*.cc" -o -name "*.h" \) -exec grep -l -i "test.*decompress" {} \;
# Search for decompression-related functionality
rg "decompress" -t cc -t h -t cpp -t hpp -A 5
Length of output: 381
Script:
#!/bin/bash
# Let's examine the test file found and check for usage of constants
cat ./components/core/tests/test-Segment.cpp
# Try searching with ripgrep without file type specifications
rg "cOrigFileId"
rg "cStreamId"
# Look for decompression functionality in the codebase
rg "decompress" -A 5
Length of output: 68017
components/core/src/clp_s/JsonConstructor.cpp (1)
125-125
: LGTM: Consistent with stream_id standardization
The renaming from cOrigFileId
to cStreamId
aligns well with the PR objectives to standardize the metadata scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think for the title you may just need a space after the ":" character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
components/core/src/clp/clp/FileDecompressor.hpp (1)
42-42
: LGTM! Consider adding parameter descriptionThe parameter rename from
is_last_ir_chunk
tois_last_chunk
improves clarity. Consider adding a brief description of what this parameter indicates.Add a description like:
- * size_t begin_message_ix, size_t end_message_ix, bool is_last_chunk) -> bool; + * size_t begin_message_ix, size_t end_message_ix, bool is_last_chunk) -> bool; + * @param is_last_chunk Indicates whether this is the final chunk in the sequencecomponents/core/src/clp/clp/decompression.cpp (1)
285-285
: Consider implementing final chunk handlingThe
is_last_chunk
parameter is marked as unused, but it could be valuable for implementing special handling for the final chunk (e.g., cleanup operations, metadata updates, or validation).Consider implementing final chunk handling:
- [[maybe_unused]] bool is_last_chunk) { + bool is_last_chunk) { auto dest_ir_file_name = orig_file_id; dest_ir_file_name += "_" + std::to_string(begin_message_ix); dest_ir_file_name += "_" + std::to_string(end_message_ix); + if (is_last_chunk) { + // Add final chunk handling (e.g., metadata updates, validation) + SPDLOG_DEBUG("Processing final chunk for {}", orig_file_id); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
components/core/src/clp/clo/clo.cpp
(3 hunks)components/core/src/clp/clo/constants.hpp
(1 hunks)components/core/src/clp/clp/FileDecompressor.hpp
(1 hunks)components/core/src/clp/clp/decompression.cpp
(1 hunks)components/core/src/clp_s/JsonConstructor.cpp
(2 hunks)components/core/src/clp_s/archive_constants.hpp
(1 hunks)components/log-viewer-webui/client/src/api/query.js
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- components/core/src/clp/clo/clo.cpp
- components/core/src/clp_s/JsonConstructor.cpp
- components/core/src/clp_s/archive_constants.hpp
- components/log-viewer-webui/client/src/api/query.js
- components/core/src/clp/clo/constants.hpp
🔇 Additional comments (1)
components/core/src/clp/clp/decompression.cpp (1)
Line range hint 287-290
: Verify consistency with stream_id changes
The PR objectives mention replacing orig_file_id
with stream_id
, but the code still uses orig_file_id
in file naming. This might need to be updated for consistency.
Let's verify the usage of these identifiers across the codebase:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the webUI changes lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
refactor(clp-package): Unify the metadata schema for JSON and IR streams.
Description
This PR is a continuation of work in #596
The PR introduces three updates to the stream metadata so that both IR and JSON metadata share the same scheme:
With the changes above, we are able to simplify the webui code in the same PR.
Validation performed
Manually tested both CLP and CLP-S log viewing.
Manually verified that extracted stream metadata in the mongodb matches expectation
Summary by CodeRabbit
Release Notes
New Features
stream_id
property to the response of extraction jobs.Bug Fixes
Documentation
These changes improve the overall functionality and reliability of the application, ensuring better alignment with the updated data structure.